Multinomial Mixture Modelling for Bilingual Text Classification

نویسندگان

  • Jorge Civera
  • Alfons Juan-Císcar
چکیده

Mixture modelling of class-conditional densities is a standard pattern classification technique. In text classification, the use of class-conditional multinomial mixtures can be seen as a generalisation of the Naive Bayes text classifier relaxing its (class-conditional feature) independence assumption. In this paper, we describe and compare several extensions of the class-conditional multinomial mixture-based text classifier for bilingual texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-Based Estimation of Word Saliency in Text

We investigate a generative latent variable model for modelbased word saliency estimation for text modelling and classification. The estimation algorithm derived is able to infer the saliency of words with respect to the mixture modelling objective. We demonstrate experimental results showing that common stop-words as well as other corpus-specific common words are automatically down-weighted an...

متن کامل

Bilingual Text Classification using the IBM 1 Translation Model

Abstract Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems has focused on the categorisation of monolingual t...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

Clustering Images with Multinomial Mixture Models

In this paper, we propose a method for image clustering using multinomial mixture models. The mixture of multinomial distributions, often called multinomial mixture, is a probabilistic model mainly used for text mining. The effectiveness of multinomial distribution for text mining originates from the fact that words can be regarded as independently generated in the first approximation. In this ...

متن کامل

A Generative Model for Self/Non-self Discrimination in Strings

A statistical generative model is presented as an alternative to negative selection in anomaly detection of string data. We extend the probabilistic approach to binary classification from fixed-length binary strings into variable-length strings from a finite symbol alphabet by fitting a mixture model of multinomial distributions for the frequency of adjacent symbols. Robust and localized change...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006